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Abstract 

We study the component structure in random intersection graphs with tunable 
clustering, and show that the average degree works as a threshold for a phase tran- 
sition for the size of the largest component. That is, if the expected degree is less 
than one, the size of the largest component is a.a.s. of logarithmic order, but if the 
average degree is greater than one, a.a.s. a single large component of linear order 
emerges, and the size of the second largest component is at most of logarithmic 
order. 



1 Introduction 

The random intersection graph, denoted Gm#, with a set of vertices V = . . . , v n } and 
a set of edges 8 is constructed from a bipartite graph Bm,p with two sets of vertices: V, 
identical to those of Gmjp, and A = {ax, . . . , a m } } which we call auxiliary vertices. Edges in 
Bm]p between vertices and auxiliary vertices are included independently with probability 
p G [0, 1]. An edge between two vertices Uj and Vj in Qm}p is only present in 8 if both Vi 
and Vj are adjacent to some auxiliary vertex at in Bm] P - Along the lines of Karohski et 
al. [5] we set m := [_(3n\ and p := 'jn~^ 1+a ^ 2 , where a, /3, 7 > 0, to obtain an interesting 
graph structure and bounded average vertex degree. For random (multi)graphs, the vertex 
degree distribution is defined as the distribution of the degree, i.e. the number of adjacent 
edges, of a vertex chosen uniformly at random. As has been shown by Stark [7] , the vertex 
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degree distribution of the random intersection graph is highly dependent on the value of 
a, but as shown by Deijfen and Kets [3], the clustering is tunable only when a = 1. In a 
recent paper by Behrisch [2] , the component structure of the random intersection graph is 
studied for a ^ 1 and (3 = 1, and the aim of the present note is to describe the component 
structure when a = 1. We will henceforth keep j3 and 7 fixed and positive, and sometimes 
suppress the dependency on these parameters in the notation: Q^ n \ 

2 The degree distribution 

We define D(m,n,p) to be a random variable with the vertex degree distribution of the 
graph Qm,p- Stark [3, Thm. 1] showed that the distribution of D(m, n,p) has the following 
generating function 

g Dim ,n, P ){z) ■= E [z D ^] = ( n 7 X V (i - zT' 1 ' 3 

3=0 \ 3 ' 

This distribution is from here onwards denoted RIG(m, n,p). Let us define a certain 
compound Poisson random variable Z by its generating function 

g z (s) :=E[s z ]=exp{\'(e^-l)}, 

and write Z e CPoisson(A', A"). Here K[Z] = X'X". Another result by Stark [6, Thm. 2], 
here slightly generalised, is 

Lemma 1. If n' andn" are functions of n such that \J3n\ > n' , n'/n = /3 + o(l), n > n" , 
n"/n = 1 + o(l), then D(n', n", 7/n) — > CPoisson(/57, 7) as n — > 00. 

This can be shown by inspecting the generating functions. In particular 

E[D(m, n,p)] = g' DM (l) = (n - 1)[1 - (1 - p 2 ) m ] 
E[D{m,n,p)(D(m,n,p) - 1)] = ^(m.n.pjC 1 ) 

= (n - l)(n - 2)[1 - 2(1 - p 2 ) m + (1 - p 2 (2 - p)) m ], 

and with n' and n" as in the lemma we can deduce E[D(n', n", 7/n)] = /x + o(l) = 0(1), 
where y, := /37 2 , and E[D(n', n" , 7/n) 2 ] = /i(l + y + 7) + o(l) = 0(1). We write 

g(s) :=exp{/3 7 (e^ - l) } 

for the generating function of the limiting distribution CPoisson(/?7, 7). Finally, let us 
define p to be the smallest non- negative root of p = g(p). 
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3 Results 



Theorem 2. Let pi := /?7 2 ; i.e. the asymptotic expected degree of a randomly chosen 
vertex of Q^ n \ 

(i) If p, < 1, then there is a.a.s. no connected component in with more than 0(logn) 
vertices. 

(ii) If fi > 1, then < p < 1 and there exists a unique giant component of size (1 — p + 
o p (l))n, and the size of the second largest connected component is a.a.s. no larger 
than O(logn). 

With W n = o p (a n ) we mean that W n /a n — > in probability as n — > oo. As mentioned 
in the introduction, Behrisch has investigated the component structure for the random 
intersection graph when a ^ 1, see [2J Thm. 1]. It is worth noting that the results in 
Theorem [2] in this note are closer to the results that Behrisch obtained for the case a > 1, 
than for a < 1. For a < 1 the size of the giant component is no longer linear in n. 



4 Proof of Theorem 2 



For the remainder of this note we will follow the notation and steps of the proof of 
Theorem 5.4 in [U Ch. 5.2]. Therefore most of the details that have not been altered from 
the original proof will be omitted. The proof is based on choosing a vertex at random 
from V, say v, and exploring its component, say C(v). We start by visiting the chosen 
vertex v and identifying its neighbours. Then we proceed by visiting an identified but 
unvisited vertex, if any remains, and identify its neighbours, and repeat this procedure 
until all vertices in the component have been visited. Let Xi denote the number of newly 
identified vertices at the ith step of this exploration process. The event {|C(t>)| = k} is 
equivalent to ^2 i=t Xi = k — 1, and it is thus important to understand the growth of this 
partial sums process. The random variables X±, X%, ■ ■ ■ are not i.i.d. but the partial sums 
process can nevertheless be related to other partial sums processes with i.i.d. summands 
so that we obtain bounds on events of the type above. We will need the following result 
for these partial sums processes. 

Lemma 3. Let 5 > and X := X\ + • • • + X^, where Xi, . . . are i.i.d. as D(n',n", , y/n) 
of Lemma Ul Then, for large enough n, there exists a positive constant C := C(/3,7,5) 
such that P(l > (1 + 8)nk) < e~ Ck and F(X < (1 - 8) file) < e~ Ck . 

Remark. This bound on the tail probabilities works since X is a sum of k independent 
random variables. As n — > oo, the RIG-distribution of the summands does not change 
much: It is more or less CPoisson(/37, 7), which is a "well behaved" distribution, and as 
k increases, we expect an exponential decay of probabilities away from the mean of the 
sum. Since C is not further specified, this bound is only useful as k tends to infinity, 
which it may or may not do clS db function of n. 
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Before we prove the lemma, note that we can construct a multigraph Tim'p from the 
same bipartite graph Bm]p as we used in the construction of Qm,p, by letting the number of 
edges between and Vj equal the number of auxiliary vertices that are adjacent to both 
Vi and Vj. We denote with RIMG(m, n,p) the degree distribution of Hm,p- RIMG(m, n,p) 
clearly dominates RIG(m, n,p), as we can obtain Gmjp from TCmjp by coalescing multiple 
edges between vertices into one single edge. RIMG(m, n,p) is a compound binomial 
distribution with generating function 

h(z) = (1 - p + p(l - p + pz) n - l ) m , 

since, by construction, a vertex Vj G 7im,p is connected to a Binomial(m,p) number of 
auxiliary vertices, each of which being connected to an independent Binomial(n — l,p) 
number of vertices in V \ {f «}. 

The expected value of RIMG( [fln\ , n, 7/n) is thus [/3n\ {n — l)j 2 /n 2 = fi + 0(l/n) = 
E[D([^n\ , n, 7/n)] + 0(l/n), so the expected difference in vertex degree between the 
multigraph and the ordinary graph is only 0(l/n). With rf n ' defined as the difference in 
the total number of edges in 7 / ra and G^p n j 7 / n , we have E[ri<ri] = 0(1), by summing 
over all vertices. This will be used in the proof of Theorem EKEII) . 

Proof ofLemma\3 Note that E[e 9 *] = E[e^ 1+ "' + ^] = E[e ejtl ] k . Let s > 0. We have 

¥(X < (1 - 6)fik) = F(e- sji > e - s{1 - s)fik ) 

< e s{1 - S) ^E[e- sji } = (e Sfl - sSfl E[e- sjtl ]y , (1) 
P(X > (1 + S)fik) = F{e sjt > e s{1+5)tlk ) 

< e - s{1+5)flk E[e s ^] = ^e-^-^Ete** 1 ])* , (2) 

by Markov's inequality. Since e~ sXl < 1 — sX\ + ^s 2 X 2 for s > 0, 

E[e- sjCl ] < 1 - sElXx] + \s 2 E\Xl] = exp{log(l - sE[X x \ + \s 2 E[Xl\)} 
= expj-sELYx] + 0(s 2 )} = exp {-s (/1 + o(l) + 0(s))} . 

The right hand side of ([T]) is thus exp {— s (<5/i + o(l) + O(s)) fc}, and we can fix a small 
s, such that for large enough n, s(5p, + o(l) + O(s)) is positive (regardless of the value of 
k), and thus P(X < (1 — 8)p,k) < e~ c k for some positive C . 

For the second part of the proof, let X e RIMG(n', n", 7/n), so that X >^ 

E[e'*] < E[e«*] = (l - 2 + * (l - * + lef"- 1 )"' 

<exp{ 7 $ (e^^-l)} 
= exp { 7 (/3 + o(l)) ( e 7(i+o(i))-(i+oW) _ ^ j. 
= exp {7/3(1 + o(l)) ( e 7-(i+o(i)+o(.)3 _ X )| 
= exp{/xs(l + o(l) + 0(s))}. 
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The right hand side of (j2J) is thus less than exp{— s(5fi + o(l) + 0(s))k}, and we can fix 
a small s, such that for large enough n, s(5fi + o(l) + O(s)) is positive (regardless of the 
value of k), and thus P(X > (1 + S)fik) < e~ c k for some positive C" . We conclude the 
proof of the lemma by letting C = min{C", C"}. □ 

Proof of Theorem\^Q) . The process of exploring vertices that was briefly described in the 
beginning of Section HJ implies that X\, the number of neighbours of the initially picked 
vertex, has distribution RIG(|_/3nJ , n, 7/71). This, together with the fact that vertices only 
can be newly identified once, implies that ^2 i=1 Xi <d X^=i^ + f° r an k i where X±, . . . 
are i.i.d. RIG(|_/3nJ, n, 7/n). Thus 

n , k k 

P(3i: \C{vi)\ > k) <^>(|C(0| > /c) =nP(|C(i;)| > k) < nFl J2 X J~ ^ k ~ 1 ) • 

i=i ^ j=i ' 

Now we take /c := A;(n) increasing to infinity with n. Since all Xf are i.i.d. 
RIG(|_/3nJ , n, 7/n), Lemma [3] applies to X + := X]j=i^ + 5 which gives us 

P(3i : \C(vi)\ >k)<n¥(X + >k-l)= n¥(X + > (1 + 2<J)^jfe - 1) 
< n¥(X + > (1 + <f)/xifc) < nexp{-Ck}, 

where \i < 1, 5 = — l)/2 > 0, C is defined as in Lemma [3j and the penultimate 
inequality follows from 5fik > 1 for large enough k. That is, if k{n) : = [(1 + e) logn/C], 
e > 0, then P(3i : |C(t>i)| > k) < n~ e — > as n — > 00, and the first part of Theorem [2] is 
proved. □ 

Proof of Theorem[^v\) . We will first show that there with probability tending to one are 
no clusters of size k with O(logn) = k_{n) < k < k + {n) := n 2 ^ 3 . From now on, let 
k_ < k < k + , where k_(n) will be specified shortly. The construction used in the proof is 
similar but more involved than the one of the first part of the proof. 

For the remainder of the proof we will implicitly condition on the event {7/™) < ^Jn\, 
whose probability tends to one when n — > 00, by Markov's inequality and the fact that 
= 0(1). Our construction fails on the complementary event, but this is of no 
consequence for the proof, since the probability of this event tends to zero. 

Let A(v) be the event that the exploration process, initiated at v , at step k + has not 
terminated and that it at that step has identified fewer than (jj, — l)fc + /2 vertices that have 
not yet been visited, i.e. A(v) = {k + < Y^j=i^j — k + ~ 1 + ~~ Let B(v) be 

the event {J2jti x j < k + ~ 1 + (p ~ l ) k +/2}. We will prove that the probability that the 
exploration process terminates after k steps or that A(v) holds for some v, tends to zero. 
Note that {|C(v)| = k} C B{v) for each k, and in particular {|C(v)| = k + }UA(v) C B{v). 
We also have {\C(v) \ = k} C {^=1 Xj <k — l + (fji — l)k/2} for each k. 

On the set B(v)D{r]^ < \^n}, the exploration process has at step k identified vertices 
in V, that are adjacent to fewer than (// + l)k + /2 + ^fn auxiliary vertices in Bmjp- We 
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claim that 



v(£x i <k-i + !^k)<v( y Exr<k-i + n-±k) (3) 

V j = i /V i=1 / 

holds with Xf, . . . , i.i.d. RlG([(3n l)k+/2 - y/n\, [n - (ji + l)fc+/2j , 7/n). Note 

that 5^i=i -^0" is nc ^ a lower stochastic bound on Y2i=i Xi in the same way as 52 i=1 * s 
an upper bound since the distribution of Xf depends on k + . The claim follows by a slight 
adaptation of the arguments of the proof of Theorem 4.3 in [HI Ch. 4.2]: We compare our 
exploration process with another exploration process, which does not follow vertices that 
belong to a group of forbidden vertices, or that are reached through edges generated by 
a group of forbidden auxiliary vertices. Both groups of forbidden vertices and auxiliary 
vertices are adjusted (diminished) after each step so that there are (// + l)k + /2 vertices 
that are forbidden or identified, and (fi+l)k + /2 + auxiliary vertices that are forbidden 
or have generated an edge to an identified vertex. These adjustments are possible until 
the process has identified (/x + l)k+/2 vertices, which is long enough to deduce whether 
fewer than k — 1 + (// — l)k/2 vertices have been identified after step k. Furthermore, 
since we keep the number of forbidden vertices and auxiliary vertices fixed, the number 
of newly identified vertices by the modified exploration process will in each step be i.i.d. 
RIG( [f3n l)k+/2 - V^J , L" - (ji + l)k+/2\ , 7/n). 

Using (131), assuming that {r/ n) < y/n} holds, gives us that 

: {k_ < \C(vi) \ < k + } U A(vi)) < nF({k_ < \C(v)\ < k + } U A(v)) 

, k+-i 

= n(j2 p (\ c ( v )\ = k)+F(\C(v)\ = k + )+¥(A{v)) 

^ k=k- 

<n^Uj^X 3 <k-l + ^k) 
fc=fc_ ^ 3=1 ' 

<n£p(]Txr<fc-i + /^A 

fc=fc_ ^ 3=1 ' 

We apply Lemma [3] to X~ := X^=i-^/ '■> which yields 

k+ f — 1 \ 
P(3z : {k_ < \C(vi)\ < k+}UA(vi)) < n P ( X ~ < k-l + - k j 

k=k- ^ ' 

= nY^ ^(x- < (1 - S)fik - l) 

k=k- ^ ' 
k+ 

<nJ2 P ( X ~ < (! - 6 )^ k ) < nk + ex P {~ Ck ~} ■ 



k=k- 



6 



where p > 1, 5 = (1 — 1/ p)/2 > and C is defined as in Lemma [3j Therefore, if 
k_{n) := [(5/3 + e)logn/C], e > and k+(n) := [n 2 ^\ then P(3i : {A;_ < |C(v<)| < 
A; + } U -A(uj)) < n _e — > as n — > oo. 

From Section [1] we know that two vertices in are not connected if they avoid 
being adjacent to the same auxiliary vertex. Thus the probability that two vertices are 
not connected is (1— 7 2 /n 2 )^ n J. Furthermore we know from the previous calculations that 
the probability that A(v) holds for some v tends to zero as n tends to infinity, i.e. if there 
exist two different components of size k + , they will each have at least (p— l)fc + /2 identified 
but not yet visited vertices. This implies that the probability that two components each 
of size k + are disjoint after visiting their additional vertices is less than 



That is, with probability tending to one, either vertices belong to connected components 
of size less than k_, or to a unique component of size at least k + . 

To show that the size of the largest component grows linearly in n with high prob- 
ability, we need to show that the number of vertices that belong to small components, 
i.e. components of size k- or less, is strictly less than n, implying the remaining vertices 
belong to the giant component. Let Lj := {|C(fj)| < A;_}, := 1^, and set Y := J27=i 
so that E[y] = raE[Yi] = raP(Li). By the same reasoning we use above, we can sandwich 
P(Li) between P(C+ < jfe_) and P(C~ < jfe_) where C + and C~ are the total sizes of 
branching processes with offspring distributed as X± and Xf , respectively. Lemma [1] im- 
plies that both offspring distributions tend to the same limit, CPoisson(/?7, 7), as n tends 
to infinity. By standard results in branching process theory, see Athreya and Ney [TJ Thm. 
1.5.1], both probabilities P(C + < kJ) and P(C~ < tend to the p that we defined as 
the smallest non- negative root of g(p) = p, since fc_(n) tends to infinity with n and p is 
the probability that the branching process with offspring distribution CPoisson(/37, 7) has 
finite total size. It also holds that < p < 1, since p > 1. Due to this, E[Y] = (p + o(l))n, 
which implies that the expected size of the largest component is (1 — p + o(l))n, and the 
proof that Y is concentrated around pn follows the last part of the proof of Theorem 1.(2) 
in Behrisch [2] Sec. 4.2, p. 8] verbatim. □ 
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